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Abstract 

We investigate the problem of counting co-authorhip in order to quan- 
tify the impact and relevance of scientific research output through nor- 
malized h-index and g-index. We use the papers whose authors belong 
to a subset of full professors of the Italian Settore Scientifico Disciplinare 
(SSD) FIS01 - Experimental Physics. In this SSD two populations, char- 
acterized by the number of co-authors of each paper, are roughly present. 
The total number of citations for each individuals, as well as their h-index 
and g-index, strongly depends on the average number of co-authors. We 
show that, in order to remove the dependence of the various indices on 
the two populations, the best way to define a fractional counting of au- 
torship is to divide the number of citations received by each paper by the 
square root of the number of co-authors. This allows us to obtain some 
information which can be used for a better understanding of the scientific 
knowledge made through the process of writing and publishing papers. 
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The ensemble of papers published by a scientist at a given epoch, which 
have been cited by the scientific community contain useful information on the 
impact and relevance of the research output of the individual. In 2005 J.E. 
Hirsch introduced the celebrated h-index [T], which would represents a mea- 
sure of research achievement, and depends on both the number of a scientists 
publications, and their impact on his or her peers. Simply said, the h-index is 
the highest number of papers signed by a scientist, that have each received at 
least that number of citations. Thus, someone with and h-index ranked H has 
published H papers each had at least H citations. The h-index represents a 
better measure with respect to other bibliometric parameters as counting total 
papers, which could reward those with many mediocre publications, whereas 
counting just highest-ranked papers may not recognize a large and consistent 
body of work during a scientific career. 

The parameter immediately attracted lots of attention of the scientific world, 
policy makers and the public media. The growth of the number of papers on 
the h-index is spectacular, and it is practically impossible to present a complete 
reference list (e.g. [H [3J IH [5J [7J |H] ) ■ Scientific news editors [2] enthusiastically 
received the new index, and researchers in various fields of science [51 [TU1 fTTT [T2"] . 
particularly in the bibliometric research community 013] started follow-up work. 
The idea of ranking scientists by a fair measure stirred the fire, because such 
rankings could make election procedures of scientific academies more objective 
and transparent. 

Apart for the simple definition of the h-index, the conclusions of the Hirsch's 
paper [T], which are based on analysis of real data, are very interesting. Hirsch 
showed that it is hard to inflate ones own h-index for example by self-citation, 
because the parameter relies on how a body of work is received over time and it 
is very hard to manipulate an entire career. Hirsch suggests that after 20 years 
in research, an H ~ 20 is a sign of success, and H ~ 40 indicates outstanding 
scientists likely to be found only at the major research laboratories. An H ~ 12 
should be good enough to secure university tenure pQ. What is also interesting 
in the data analysis by Hirsch is the fact that applying the method to prominent 
physicists, it can be found that 84% of Nobel prize winners have substantial h- 
indices H > 30, while prominent physicists have H > 50 pQ, thus indicating 
that Nobel prizes, or even a brilliant scientific career, do not originate in one 
stroke of luck but in a body of scientific work. 

Among other, one of the main and perhaps the only serious disadvantage 
of the h-index has been revealed by L. Egghe [13l [14], who noted that the h- 
index is insensitive to one or several outstandingly high cited papers. Indeed, 
although highly cited papers are important for the determination of the value 
H of the h-index, once such a highly cited paper is selected to belong to the 
top H papers, its actual number of citations at any time is not used anymore. 
Once a paper is selected to the top group, the h-index calculated in subsequent 
years remains insensitive to the citation of this paper, whatever the number 
of subsequent citations. To overcome this disadvantage of the h-index while 
keeping its advantages, it has been introduced the g-index [T21Q3]. Note that 
by definition the papers on rank 1, . . . , H each have at least H citations, and 
hence these papers have, togheter, at least H 2 citations. The parameter G 
defined through the g-index [13] is just the largest rank such that the first G 
papers have, together, at least G 2 citations. Obviously G > H in all cases. 

Actually a scientific work is made in general by collaborations among two or 
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more scientists, so that lots of attention has been given to coauthored papers, say 
papers signed by more than one author |15] . How are the credits of each authorc 
counted ? In other words, does every author in a rt-authored paper get a credit of 
1 (total counting) or does every author get a credit of 1/n (fractional counting)? 
In general fractional counting is preferred because this does not increase the total 
weight of a single paper. The same question has been posed by Hirsch [1. which 
states that ... a scientist with a high H achieved mostry through papers with 
many co-authors would be treated overly kindly by his or her H . Subfields with 
typically large collaborations (e.g. high-energy experiments) will exhibit larger 
H values, and I suggest that in cases of large differences in the number of co- 
authors, it may be useful in comparing different individuals to normalize H by 
a factor that reflects the average number of co-authors pQ. Possible solutions 
range from the simple division of H by the average number of researchers in 
the publications of the Hirsch core0 US], to discount the h-index for career 
length, multi-authorship and self-citations |16j . and to take into account the 
actual number of co-authors and the scientists relative position in the byline 
|17j . Even if all the above proposal present advantages and disadvantages, in 
this paper I investigate how the fractional counting of autorship should simply 
work on a real case. 

Aimed by a more accurate approach to fractional counting, we investigate 
scientific performances of a subset of anonymous individuals. We select individ- 
uals within the italian full professor of experimental physics belonging to the 
Settore Scientifico Disciplinare (SSD) FIS01. This choice is due to the fact that 
different individuals, belonging to uncomparable experimental facilities, coexist 
within this SSD. Using the Thomson ISI Web of Science database (available at 
http ://isiknowledge.com[ ), we select all the papers of a subset of N = 60 full 
professors belonging to the above mentioned SSD, which roughly corresponds 
to 25% of the whole full professors of the SSD. Let us consider, for each j-th 
individual (j = 1,2,..., 60), the average value Mj of authors of each publication 
Mj and the total number of citations Ctot of the j-th scientist at a given epoch. 
The value Mj is correlated to the number of publications rij , thus trivially both 
the usual h-index and g- index are correlated to Mj, as results from fig.s[TJ As 
suggested by Hirsch pQ , the total number of citations is linearly related to H 2 
through Ctot = aH 2 , with a parameter which results a = 4.45 ± 0.06, in agree- 
ment with Hirsch pQ. However, also the value of G 2 is related to C to t through 
the linear relation Ctot — PG 2 , where f3 — 1.68 ± 0.02 (not shown here). 

What is interesting from fig.s [TJ is that, as nively expected, two different 
populations are present within the SSD FIS01 which differ for the amount of 
the average number of co-authored papers. The two populations belong to the 
same SSD as it is well known, even if, as showed here, rij and both H and G are 
strongly dependent on Mj. In other words, the more the number of co-authors 
the higher the parameters which denote scientific performances. It goes without 
saying that it is much easier to get a high h-index when one has written many 
papers with many collaborators. Note that this is crucial if we conjecture that 
funding, tenure positions, etc. could be attributed on the basis of scientific 
performances. For example it could be conjectured that an individual might 
have an h-index greater than a threshold value H > H t h in order to access a 
position of full professor in the SSD FIS01. It is clear that the non- homogeneity 
due to the two populations within FIS01 will make without sense an objective 
valutation. In order to avoid the rejection a priori of the use of an objective 
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scientometric index, we must investigate the problem of co-authorship. 

I propose to weight each i-th paper of the j-th individual according to a 
fraction of the co-author number (m; is the number of co-authors of the i-th 
paper), and to compare the fractional indices which results from this operation. 
More formally, let us consider the weighted number of citation for each paper 
Ci/m^, where Cj is the number of citation collected by the i-th paper, 
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A moment of reflection suffices to realize that the maximum weight /i = 1 
has the same effect of no weight /i = 0, because the two popolation still should 
persist, even if with roughly upsetting their dependence on Mj. The most useful 
way to preceed is then to find the value of the parameter < fJ, < 1 such that 
the resulting indices are independent on Mj . In other words we calculate a value 
fi* which minimizes the dependence of the parameters on Mj. Looking at fig.s 
[T]we can conjecture that there roughly exists a linear relation between and 
Mj , and between and Mj . Then, by using the relation 
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(the index p stands for both h and g). we can simply define a best parameter 
through 



V* (N) * min {f g (N, fx) + f h (N, M ) } (4) 

Using our dataset made by TV = 60 individuals as an example, we find that 
the best-fit parameter which minimize the dependence on Mj results ji* ~ 
0.53 ± 0.01, curiously close to 1/2. In fig. [5]we report the values of both and 
g^ as a function of Mj , where the independence of both normalized indices on 
Mj is clearly showed. It is worth reporting that also the sqyares of fractional 
indices are linearly related to the fractional numer C^} of total citation obtained 
by summing the fractional citations over all papers. This is made through 
the linear coefficients c\^} — a^h 2 ^ = /3 fJi g 2 i , where — 5.45 ± 0.08 and /3 M = 
2.04 ±0.07. 

The problem of the determination of typical values /i* and g*, and mainly 
their fluctuations within a given ensemble of individuals, should be of some 
practical interest. To evaluate these values we can build up the hystogram of 
both hp and g^ calculated for /i = 0.53, which are reproduced in fig. [3] The 
empirical distributions for both normalized indices can be very well reproduced 
through a Cauchy-Lorentz distribution function 
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The maximum values p of the distributions can be considered as typical h- 
indcx and g-index for the class of scientists at hand, while typical fluctuations 
are described by the values of a p . In our example, the best fit correspond to 
h* ~ 6.80 ± 0.01 with a h ~ 4.0 ± 0.1, and g* ~ 11.70 ± 0.07 with a g ~ 8.5 ± 0.6. 

The information we obtained can be used to infer something about scientific 
processes of knowledge. The fact that the best way to overcome the difficulty of 
co-authorship seems to be weighting each paper by the square root of the number 
of authors of that paper is quite evocative of a random walk dynamics. This 
should perhaps indicate that in big experimental collaborations, whose output is 
a paper with a lot of co-authors, the effective work is carried out independently 
by relatively small group of scientists, as usually happens in smaller laboratories 
within Universities. Moreover, the occurrence of a Cauchy-Lorentz distribution 
for normalized indices indicates that the various scientists tend to differentiate 
enough to generate a process of homogeneous broadening. Very interestingly, 
the fact that a g > ah in the distribution functions means that the normalized 
g-index is the result of a larger broadening with respect to the h-index. This 
indicates that actually a succesfull scientific career is the result of some few 
research papers with a great impact and some more papers with fewer citations. 

In conclusion, I investigated the problem of how to weight a co-authored pa- 
per in order to not reject a priori the possibility of objectively using parameters 
as the h-index or the g-index. I introduced the fractional indices and built 
up by weighting the citations of the i-th paper with a power /i of the number 
of co-authors. The best fit parameter which minimize the strong dependence of 
the number of papers, citations and indices on the average number of co-authors 
if close to ji ~ 1/2. More interstingly, we found that, at least within the SSD 
FIS01 where two populations of scientists coexist, the above fractional count can 
gives rise to a single population. The information on the distribution functions 
of normalized indices could be very useful, for example, during the selection pro- 
cedures of scientific academies, research funding and tenure decisions, which are 
often seen as opaque, clubby and capricious. In fact the hypothetical commettee 
could be free to use a threshold values h t h — h* — rah an d gth — g* — ra g , where 
r is an arbitrary parameter, as one of the objective criteria to select younger 
scientists. Of course this is just one of the possible way to overcome the prob- 
lem, and different methods can be investigated, even if they might be aimed at 
the solution of the presence of a double popolation. Bibliometric indicators as 
the normalized h-index and g-index, which as we showed are useful parameters 
to evaluate the output of science and which gives us some information about 
the way scientists actually work, cannot be considered as the only yardstick to 
evaluate the career of an individuals. 
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Figure 1: In the upper panel we report the number of papers as a function of 
the average number of coauthors for a given individuals. In the lower panel 
we report both the indices H (squares) and G (triangles) as a function of the 
average number of coauthors for a given individuals. 
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Figure 2: In the upper panel we report the values of both H (withe symbols) 
and of their normalized value (black symbols) for fi = 0.53. In the lower 
panel we report the values of both G (withe symbols) and of their normalized 
value (black symbols) for /i = 0.53. 
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Figure 3: We report the binned values of both (circles) and g^ (squares) 
for n = 0.53. Superimposed as full lines we report the fitted Cauchy-Lorentz 
functions Lh{x) and L g (x) (see text). 
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